Resampling Methods

Departing from strict theoretical principles, basing inferences on statistics obtained from repeated resampling of the same sample

Statistical Methods
Resampling
Author

Kobus Esterhuysen

Published

June 21, 2017

Back to Blog |  LearnableLoopAI.com |  Portfolio of Projects |  LinkedIn


Parametric procedures compare observed statistics to sampling distributions based on solid theoretical principles. However, there are always conditions on the data before these procedures can be applied. Resampling is a revolutionary procedure for it departs from strict theoretical principles and simply base its inferences on statistics obtained from repeated resampling of the same sample. In other words, there is no need to satisfy parametric conditions on the data. Even so, resampling is not the only technique that allows a researcher this relaxation of conditions. There is a wide variety of so-called non-parametric methods. When parametric conditions are not satisfied, a valid question then becomes, should one first attempt to use resampling techniques, or is it better to first go with the other non-parametric methods.

As mentioned, there is a large group of other non-parametric methods with a great diversity of approaches. It pays to be familiar with at least a rudimentary knowledge of the theory of an approach before it is applied to a problem. In contrast to this situation, one may simply use a resampling method. The basic principles of resampling are not hard to grasp and there is hardly a diversity of theory among its variants.

As pointed out by Chuong Ho Yu in his 2002 article, called Resampling methods: Concepts, Applications, and Justification (see https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1128&context=pare), there are four main types of resampling methods: Permutation test, Cross-validation, Jackknife, and Bootstrap. Even so, the underpinnings of the methods are not far apart. Ho makes a point of this too: “The principles of cross-validation, Jackknife, and bootstrap are very similar, but bootstrap overshadows the others for it is a more thorough procedure in the sense that it draws many more sub-samples than the others.” In other words, three of the four main classes are very similar, and the class that deviates mainly does so because it draws more resamples than the other three.

In a sense, the resampling methods represent a “generic” approach for a sub-group of non-parametric methods that represents a valuable economy of expression that is always welcome in a complicated discipline such as statistics. This in itself, to me, is a good reason to use resampling as a first resort, rather than other non-parametric methods.

Resampling techniques have some definite advantages. Often there is not a clear-cut decision on whether parametric conditions hold or the extent to which they hold. The researcher’s mind may be eased by simply using a resampling method. Furthermore, the concepts behind resampling is clean and simple. There is no need to have a mathematical background or other special smarts. As a matter of fact, some studies have shown that students new to statistics fared better when taught using resampling techniques than students that came through traditional statistics schooling (cited in Rudner & Shafer, 1992).

The most fundamental requirement for parametric or other procedures is that the sample be representative or random. It seems that even this condition might be circumvented when using a resampling method (Edgington 1995). Not all researchers agree on this point though. Another advantage is in the case of studies with a small sample size. Even if parametric conditions are satisfied the test may suffer from a low level of power. In this case the bootstrap method could be applied to treat the small sample as a virtual population and just generate a sufficient number of observations. Similarly, a resampling technique may come to the rescue when the sample size is too “large” in a situation of overpowering.

As always, there is the other side of the coin. One criticism of resampling was voiced by S. E. Fienberg when he said: “You’re trying to get something for nothing. You use the same numbers over and over again until you get an answer that you can’t get any other way. In order to do that, you have to assume something, and you may live to regret that hidden assumption later on” (cited in Peterson, 1991, p. 57). Another criticism finds fault with the idea that the whole technique of resampling is based on a single sample (no surprise here). The point is that the generalization cannot go beyond that specific sample. Other critics have a problem with bias in the data that was collected. Their point is that resampling would simply repeat and magnify the bias. One more criticism is that resampling may be less accurate than traditional parametric methods unless sufficient experimental trials are performed – not a very convincing argument given today’s prominent landscape of low-cost computing.

To summarize, the mentioned advantages should be convincing as to the virtues of resampling methods in general. More specifically, the idea of having a “generic” approach when a non-parametric method is called for, rather than a wide variety of non-parametric approaches sounds appealing. This allows for a valuable economy of expression in the non-parametric area. Finally, the attractive landscape of extremely affordable computing resources leaves no excuse (in my mind) for not reaching first for a resampling method, when a non-parametric method is called for.

References:

Resampling methods: Concepts, Applications, and Justification (see https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1128&context=pare)